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Genome-wide pharmacogenomic study of citalopram- 
induced side effects in STAR*D 

DE Adkins 1 , SL Clark 1 , K Aberg 1 , JM Hettema 2 , J Bukszar 1 , JL McClay 1 , RP Souza 3 and EJCG van den Oord 1 

Affecting about 1 in 1 2 Americans annually, depression is a leading cause of the global disease burden. While a range of effective 
antidepressants are now available, failure and relapse rates remain substantial, with intolerable side effect burden the most 
commonly cited reason for discontinuation. Thus, understanding individual differences in susceptibility to antidepressant 
therapy side effects will be essential to optimize depression treatment. Here we perform genome-wide association studies 
(GWAS) to identify genetic variation influencing susceptibility to citalopram-induced side effects. The analysis sample consisted 
of 1762 depression patients, successfully genotyped for 421 K single-nucleotide polymorphisms (SNPs), from the Sequenced 
Treatment Alternatives to Relieve Depression (STAR*D) study. Outcomes included five indicators of citalopram side effects: 
general side effect burden, overall tolerability, sexual side effects, dizziness and vision/hearing side effects. Two SNPs met our 
genome-wide significance criterion (q< 0.1), ensuring that, on average, only 10% of significant findings are false discoveries. In 
total, 12 additional SNPs demonstrated suggestive associations (q< 0.5). The top finding was rs1 71 35437, an intronic SNP within 
EMID2, mediating the effects of citalopram on vision/hearing side effects (P= 3.27 x 10" 8 , q= 0.026). The second genome-wide 
significant finding, representing a haplotype spanning ~30 kb and eight genotyped SNPs in a gene desert on chromosome 13, 
was associated with general side effect burden (P=3.22 x 10" 7 , q = 0.096). Suggestive findings were also found for SNPs at 
LAMA1, AOX2P, EGFLAM, FHIT and RTP2. Although our findings require replication and functional validation, this study 
demonstrates the potential of GWAS to discover genes and pathways that potentially mediate adverse effects of antidepressant 
medications. 
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Introduction 

Associated with substantial co-morbidity 1 and increased 
mortality, 2 major depressive disorder (MDD) imposes im- 
mense costs in human suffering and economic productivity. 
While the development of multiple classes of antidepressant 
medications has greatly improved MDD treatment in recent 
years, 3 ' 4 a substantial proportion of patients (~50%) fail to 
attain adequate response to their initial antidepressant 
therapy. 5 ' 6 Compounding the difficulty of identifying success- 
ful treatment, antidepressant use is frequently associated with 
adverse drug reactions, with the inability to tolerate side 
effects being the most common reason for discontinuing 
antidepressant therapy. 7-10 Clinicians currently have no way 
to predict individual efficacy and side effect profiles, and trial 
and error switching often leaves MDD patients in psycholo- 
gical distress for weeks or months. Clearly, improved methods 
of patient-antidepressant matching to minimize side effects 
would greatly improve depression treatment. 

Previous studies have indicated that antidepressant 
response is substantially heritable, 11-14 suggesting pharma- 
cogenomics research as a promising avenue toward individuali- 
zing antidepressant treatment. Preliminary pharmacogenetic 
research has, for instance, suggested an important role for 



genes related to serotonin function in antidepressant side 
effects. 15-17 The serotonergic system is involved in the 
regulation of physiological functions that are often disturbed 
in antidepressant treatment, including neuroendocrine 
mechanisms regulating reproductive events such as sperma- 
togenesis, ovulation and sexual behavior. 15 ' 16 ' 18-20 However, 
to date, robust consistent evidence associating any specific 
candidate gene or polymorphism to antidepressant side 
effect response has been rare. This is due, at least in part, 
to the inherent shortcomings of candidate gene approaches, 
which are sharply restricted by current limitations in knowl- 
edge of depression neurobiology and antidepressant 
mechanism of action. Methods that systematically screen 
variants across the whole genome for association with 
antidepressant side effects are therefore critical to discover 
novel variants. Such approaches have recently begun to yield 
tangible results, with the number of replicated marker- 
disease associations increasing dramatically since the intro- 
duction of genome-wide association studies (GWAS), 21 and 
some initial applications in the context of psychiatric pharma- 
cogenomics 22-27 

In this paper, we use a GWAS approach to search for 
genetic variation affecting the susceptibility for citalopram- 
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induced side effects. Our study sample consists of the 1762 
MDD patients in the level 1 citalopram medication trial of the 
Sequenced Treatment Alternatives to Relieve Depression 
(STAR*D). 28 Analyses were performed on the five primary 
side effect dimensions indicated by factor analyses of a 
battery of side effect indicators. 28 

Materials and methods 

Study design and subjects. Subjects came from the 
STAR*D study, which has been described in detail 
elsewhere. 28 In short, STAR*D is a multistage trial of 
different treatment options for patients with nonpsychotic 
MDD. STAR*D enrolled a total of 4041 outpatients with 
MDD. 29 In level 1, all patients were given only citalopram. 
Those patients who did not have an efficacious response or 
could not tolerate side effects to citalopram were then 
randomized to other treatment options in other levels, 2a or 
2-4. Other treatment options included, but were not limited to, 
sertraline, buproprion, buspirone, cognitive therapy, various 
combinations of these treatments and using one of these 
treatments in combination with citalopram. To maximize 
power, this study focuses exclusively on the 1762 patients 
participating in the initial, level 1, citalopram-only trial, for 
whom genome-wide genotype data were collected. 

Clinical measures. Side effect presence and tolerance was 
measured by the Patient-Rated Inventory of Side Effects 
(PRISE). 28 For each of the eight biological systems assessed 
(gastrointestinal, heart, skin, nervous system, vision/hearing, 
genital/urinary, sleep and sexual functioning), patients were 
asked two types of questions, one relating to specific side 
effects (for example, dizziness, anorgasmia) and one relating 
to the overall tolerability of side effects for a given biological 
system. For the specific side effects, patients were asked to 
indicate whether the side effect was present (0 = no side 
effect; 1 =side effect present). There were four items for the 
gastrointestinal system, three for heart, four for skin, four for 
the nervous system, two for vision/hearing system, four for 
the genital/urinary system, two for sleep and three for sexual 
functioning. In addition, an overall score of side effect 
tolerability was given for each of the eight biological 
systems as a trichotomous item, scored: 0 = no side effects; 
1 = tolerable side effects; or 2 = distressing side effects 29 

Specifying side effect phenotypes. Specifying side effect 
phenotypes for GWAS proceeded in two steps. The first was 
to condense the 34 side effect indicators of the PRISE using 
factor analysis, thereby improving measurement through 
identifying the latent phenotypic constructs underlying the 
side effect indicator. 30,31 The second was to analyze 
longitudinal change in side effect phenotypes, to derive 
treatment effects for each drug based on the obtained factor 
scores. 

Identifying valid and reliable side effect measures was 
essential for optimizing power in the GWAS. This required 
data reduction, as using individual PRISE items as GWAS 
outcomes would provide minimal statistical power, given the 
presence of substantial measurement error in self-report 



measures combined with the inherently weak statistical 
power of dichotomous outcomes. 30 ' 33 Given the limitations 
of using individual items as GWAS outcomes, collapsing 
the items into symptom class sums corresponding to clinical 
experience (for example, central nervous system, gastro- 
intestinal) may seem an appealing, intuitive approach. 
However, preliminary analyses indicated that the internal 
consistency of many of these symptom class measures 
was poor, despite their intuitive appeal (for example, 
Cronbach's a = 0.61 for gastrointestinal items and =0.66 for 
central nervous system items). Thus, we employed a more 
rigorous psychometric approach to identify robust latent side 
effect constructs. 

Using Mplus 6.0, 34 exploratory factor analyses were 
conducted to empirically examine the factor structure of the 
34 side effect indicators of the PRISE. Of the several potential 
factors emerging from the exploratory analysis, five were 
retained based on overall fit to the data, interpretability and 
having a Cronbach's a >0.70, which indicates good reliability 
for the factor. 35 The five factors were: (1 ) a general side effect 
burden factor in which all of the 26 symptom measures 
served as factor indicators; (2) an overall tolerability factor 
based on the eight system-specific tolerability indicators; 
(3) a sexual factor; (4) a dizziness factor; and (5) a factor 
relating to vision\hearing (that is, oculoauricular) side 
effects. Once the optimal factor structure was determined, 
factor scores for each of the five factors were calculated 
using the standard regression scoring approach. The factor 
loadings and Cronbach's a for each factor are displayed 
in Table 1. 

Factor analysis results presented in Table 1 were based on 
all observations, irrespective of the time spent on a treatment. 
Given that side effects sometimes vary by duration of 
treatment with a specific drug, 36 additional factor analyses 
were performed, which divided the STAR*D sample based on 
the number of days on a given treatment. This was carried out 
to confirm that the factor solution was robust across treatment 
duration and not artifactual because of heterogeneity across 
time (see Supplementary Tables A1-A5). Another potentially 
confounding issue in evaluating side effects to antidepressant 
treatment is the presence of somatoform symptoms, which 
commonly co-occur with depression and may influence 
perception of the presence and severity of side effects, 
independently of actual side effects. 37 ' 38 Thus, the factor 
analyses discussed above were repeated controlling for 
baseline somatoform and hypochondriasis diagnoses, 
assessed using the Psychiatric Diagnostic Screening Ques- 
tionnaire 39 (see Supplementary Tables A6-A7). Comparable 
factors emerged with similar patterns of loadings when the 
sample was stratified by the number of days on drug and when 
controlling for somatoform symptoms, supporting the robust- 
ness of the factor structures. 

Generating treatment effect measures. To maximize 
power for pharmacogenomic GWAS, we developed a 
method to estimate medication treatment effects from all 
available information 32 using mixed modeling. 40 ' 41 Our 
method first determines the optimal functional form of 
overtime drug response, then screens many possible 
covariates to select those that improve the precision of the 
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Table 1 Factor loadings for side effect factors 



System 


Item 


GSE 


Tolerability 


Dizziness 


Sexual 


Vision\hearing 


Gl 


Diarrhea 
Constipation 
Dry mouth 
Nausea/vomiting 
Gl tolerability 


0.24 
0.27 
0.38 
0.31 


0.55 








Heart 


Palpitations 
Dizziness on standing 
Chest pain 
Heart tolerability 


0.29 
0.58 
0.35 


0.58 


0.83 
0.74 






Skin 


Rash 

Increased perspiration 

Itching 

Dry skin 

Skin tolerability 


0.23 
0.3 
0.38 
0.35 


0.49 








CNS 


Headache 
Tremors 

Poor coordination 
Dizziness 
CNS tolerability 


0.37 
0.32 
0.43 
0.56 


0.60 


0.68 






Vis\Hear 


Blurred vision 
Ringing in ears 
Eyes\ear tolerability 


0.44 
0.35 


0.51 






0.65 
0.63 
0.92 


GenMJrin 


Difficulty urinating 
Painful urination 
Menstrual irregularity 
Frequent urination 

OdlxUIIM. lUlfcM dUIII Ly 


0.25 
0.21 
0.08 
0.33 










Sleep 


Difficulty sleeping 
Sleeping too much 
Sleep tolerability 


0.3 
0.07 


0.42 








Sex 


Loss of sexual desire 
Trouble achieving orgasm 
Trouble with erections 
Sex tolerability 

a 


0.26 
0.18 
0.22 

0.74 


0.35 
0.72 


0.80 


0.65 
0.56 
0.44 
0.88 
0.71 


0.75 



Abbreviations: CNS, central nervous system; Gen, genital; Gl, gastrointestinal; GSE, general side effect burden; Vis vision. 



treatment effect estimates, and finally generates the 
individual treatment effect estimates based on the best 
fitting model using best linear unbiased predictors. 42 As this 
approach takes advantage of all available information in 
STAFTD, it results in more precise estimates than traditional 
approaches that estimate treatment effects using only two 
assessments (for example, subtracting pre- from post- 
treatment observations). 43 In addition, as treatment effects 
are based on mixed-model trajectory slopes, they are more 
robust to early dropout/discontinuation than competing 
approaches 4445 

To determine the optimal model of overtime drug response 
for each side effect outcome, we fit a series of models 
specifying linear change for a given number of days on drug 
and flat thereafter. This series began with a model assuming 
that maximal drug response was achieved at day 1. Each 
subsequent model specified an incrementally longer duration 
until maximal drug response was achieved, with the final 
model assuming that the drug effect did not plateau (that is, 
linear change throughout the trial). The function produced by 
the log likelihoods of this series was then optimized to 



determine the best estimate of the average number of days 
until maximal drug response. This duration varied across 
side effect outcomes with dizziness plateauing earliest 
(36 days on drug), and general side effect burden and overall 
tolerability plateauing latest (137 days on drug) (Supplemen- 
tary Figure B1). 

After determining the optimal functional form of overtime 
drug response, 55 covariates collected during STAFTD were 
screened to identify those that improved the precision of the 
treatment effect estimates, using a criterion based on 
reduction in residual error variance relative to treatment 
random effect variance. 23 ' 32 Screened covariates consisted of 
trial design characteristics, socio-demographic measures, 
clinical information, health-care access and reason for study 
exit (Supplementary Table B1 for full list). The number of 
selected covariates was 6 for vision/hearing and sexual side 
effects, 7 for general side effect burden and overall tolerability 
and 8 for dizziness. Design characteristics and concurrent 
psychiatric diagnoses (particularly drug abuse and hypochon- 
driasis) comprised the vast majority of selected covariates 
(Supplementary Table B2 for full list). 
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Finally, treatment effects were generated by employing a 
unique feature of the mixed-model random effects. Briefly, the 
mixed model estimates two types of parameters, coefficients 
that describe the predictors' average effects for the full sample 
(that is, fixed effects) and deviations from the average effects 
for each subject (that is, random effects). Thus, for each of the 
antidepressants investigated, we were able to output treat- 
ment effects as random drug effects. Intuitively, these 
treatment effects quantify how much each subject's side 
effect phenotype changes in response to a given drug, relative 
to the average effect for all subjects who took the drug. 
Treatment effects estimated in the manner described here 
have been published previously both in former GWAS of the 
currently analyzed STAFTD study 24 and several GWAS of the 
CAT IE antipsychotic clinical trial. 23,26 ' 46 ' 47 

Genotyping and quality control. Details of the STAFTD 
genotyping and quality control methods have been described 
previously 27 Briefly, approximately half the sample (N= 964) 
was genotyped on the Affymetrix Human Mapping 500k 
Array Set, and the other half was genotyped using the 
Affymetrix Genome-Wide Human Array 5.0 (A/=975). Both 
versions genotype the same set of SNPs, the only difference 
being that the later 5.0 version accomplishes this with one, 
rather than two, microarray/s. As preliminary examination 
indicated differences in data quality between the two 
platforms, we conducted additional data cleaning routines 
separately by platform, in PLINK. Subjects were excluded for 
low (<0.95) genotype call rates (3 and 85 subjects excluded 
in the 500K and 5.0 arrays, respectively), and SNPs were 
excluded for low (<0.01) minor allele frequency (14562 and 
12 012 SNPs excluded in the 500K and 5.0 arrays, 
respectively) and low genotype call rate (19113 and 54447 
SNPs excluded in the 500K and 5.0 arrays, respectively). 
SNPs were not excluded based solely on deviations from 
the Hardy-Weinberg equilibrium given the possibility 
of informative reasons for departures from the Hardy- 
Weinberg equilibrium 48,49 PLINK was also used to identify 
89 potential genotype-clinical sex disagreement, which were 
excluded from analysis. While the default PLINK sex 
disagreement settings (that is, X-chromosome inbreeding/ 
homozygosity estimate >0.8 for female and <0.2 for male) 
are conservative, they ensure that no unambiguous data are 
included in the analysis. After cleaning and merging, our 
analysis included 421 789 SNPs from 1762 subjects with a 
successful genotyping rate of 99.6%. 

Statistical analyses and multiple testing. All association 
testing was conducted in PLINK, 50 using a linear regression 
model of additive SNP effects. As STAFTD is an ethnically 
heterogeneous sample (79% Caucasian, 15% African 
American and 6% 'Other'), in the GWAS we adjusted for 
ancestral background, which can otherwise cause spurious 
associations due to population stratification. Specifically, we 
used the multi-dimensional scaling approach implemented 
in PLINK (http://www.pngu.mgh.harvard.edu/~purcell/plink/ 
strat.shtml#mds), which is essentially equivalent to principal 
component method implemented in Eigensoft. 51 Input data 
for the multi-dimensional scaling approach were the genome- 
wide average proportion of alleles shared identical by state 



between any two individuals. The first multi-dimensional 
scaling dimension from this genetic similarity matrix captures 
the maximal variance in the genetic similarity; the second 
dimension is orthogonal to the first and captures the 
maximum amount of residual genetic similarity, and so on. 
We included the first five ancestral multi-dimensional scaling 
dimensions as covariates in the GWAS, based on analyses 
showing that additional dimensions neither predicted 
response nor explained significant additional covariance 
between SNPS. 24 

We used a false discovery rate (FDR) 52 approach to declare 
significance. In comparison to controlling a family-wise error 
rate (for example, Bonferroni correction), FDR (a) provides a 
better balance between finding true effects versus controlling 
false discoveries, (b) results in comparable standards for 
declaring significance across studies because it does not 
directly depend on the number of tests, and (c) is relatively 
robust against having correlated tests. 53 FDR is commonly 
used in many high-dimensional applications and has been 
successfully applied in the context of GWAS. 54-56 We set an 
FDR threshold of 0.1 for declaring genome-wide significance. 
This specifies that, on average, 10% of the SNPs declared 
significant are expected to be false discoveries. In addition, 
we discuss suggestive associations at an FDR threshold of 
0.5 to reduce the probability of Type II statistical errors, while 
explicitly noting reduced confidence in these associations. 
Operationally, 57 FDR was controlled using g-values. Q-values 
are FDRs calculated using the P-value of the markers 
as thresholds for declaring significance, 58,59 and can be 
described as: 

q(p) = pFDR(p) = - ^oP 
wv^; ^ p r (p < p) 

_ 7i 0 p + (1 -teo) Pr(P<p|Hi) 

where n 0 is an estimate of the proportion of null associations, p 
is the observed P-value, P is a random normally distributed 
variable, H 0 is the null hypothesis that the SNP-side effect 
association p = 0 and is the alternative hypothesis that 
P^0. Thus, the numerator equals the product of the 
proportion of null associations x the observed P-value, and 
the denominator is the weighted sum of the probability of 
obtaining a test statistic at least as extreme as the one 
observed, given the null hypothesis, and of the probability of 
obtaining a test statistic at least as extreme as the one 
observed given the alternative hypothesis, weighted by the 
proportion of null associations (see Storey 58 for formal 
derivation). While n 0 may be empirically estimated, we have 
assumed the most conservative value 7c 0 =1; thus, the 
formula simplifies to the observed P-value divided by the 
probability of obtaining a test statistic at least as extreme as 
the one observed given the null hypothesis. 

For the most promising SNPs, we performed a variety of 
additional analyses to examine the robustness of the signal. 
First, we tested the SNPs separately in the subjects who self- 
identified as European Americans (EA) only and African 
American (AA) only (Supplementary Tables C1 and C2). In 
addition, for each SNP, we performed haplotype (proxy) 
analyses that incorporate information from other SNPs in 
that region (http://www.pngu.mgh.harvard.edu/~purcell/plink/ 
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Table 2 GWAS results with g-value <0.5 



Side effect Locus Test 





Chr 


Position 


SNPID 


Gene 


MAF 


MA 


N 


P -value 


q-value 


Effect 


Vision\hearing 


7 


100 902 714 


rs1 71 35437 


EMID2 


0.019 


A 


1675 


3.27E-08 


0.026 


+ 


Overall tolerability 


18 


6931 108 


rs4398173 


LAMA1 


0.460 


G 


1691 


5.99E-07 


0.315 


+ 


Overall tolerability 


18 


6931 662 


rs3810046 


LAMA1 


0.456 


T 


1700 


4.78E-07 


0.281 


+ 


Overall tolerability 


16 


13564386 


rs2903308 




0.461 


T 


1687 


1.69E-06 


0.496 


+ 


GSE 


5 


38488 651 


rs7715172 


EGFLAM 


0.071 


G 


1706 


2.91 E-06 


0.358 


+ 


GSE 


3 


59896 060 


rs4502542 


FHIT 


0.017 


T 


1690 


1 .62E-06 


0.262 


+ 


GSE 


13 


83028233 


rs6563353 




0.177 


A 


1666 


2.06E-06 


0.299 


+ 


GSE 


13 


104605 024 


rs1 6965962 




0.030 


T 


1706 


3.22E-07 


0.096 


+ 


GSE 


X 


116106291 


rs6646773 




0.076 


C 


1704 


1 .30E-06 


0.231 


+ 


GSE 


4 


187575 329 


rs1 1935103 




0.262 


T 


1705 


1 .33E-06 


0.234 


+ 


GSE 


3 


188896 685 


rs6764050 


RPT2 


0.023 


G 


1706 


8.73E-07 


0.182 


+ 


Dizziness 


20 


11 105011 


rs6040399 




0.308 


T 


1684 


6.89E-07 


0.216 


+ 


Dizziness 


2 


201 356884 


rs1 3430864 


AOX2P 


0.040 


G 


1671 


5.12E-07 


0.181 


+ 


Dizziness 


2 


201 360698 


rs1 3423450 


AOX2P 


0.040 


C 


1689 


7.57E-07 


0.228 


+ 



Abbreviations: Chr, chromosome number; effect, direction of the effect of minor allele; GSE, general side effect burden; GWAS, genome-wide association study; LD, 
linkage disequilibrium; MA, minor allele; MAF, minor allele frequency; N, sample size; SNP, single-nucleotide polymorphism. Locus information includes Chr and 
location of SNP (bp, Genome Build 36.3). Direction of effect is properly interpreted as '+', indicating that the MA is associated with increased side effect burden. Rows 
in bold indicate SNPs in high LD (t 2 >0.8) with each other. 



proxy.shtml). Such analyses may provide a technical valida- 
tion of the single SNP result or point to a particularly 
informative haplotype. 

Results 

Table 2 provides details on those SNPs that were genome- 
wide suggestive (g<0.5). Figure 1 shows quantile-quantile 
plots for all analyses. The plots show that the distribution of P- 
values from the GWAS are generally on a straight line, 
indicating the expected P-value distribution under the null 
hypothesis assuming no effects of the markers. However, in 
four of these five plots, there is also evidence that markers in 
the right upper corner have P-values smaller than would be 
expected under the null hypothesis, suggesting true associa- 
tion between these markers and the outcome variable. The 
plots also display k values (that is, the ratio of the median 
observed P-value of the distribution to the expected P-value 
under the null hypothesis) approximately equal to 1 
(X = 0.996-1 .033), indicating no systematic test statistic 
inflation and suggesting that population stratification was 
generally well-controlled. The full set of GWAS P-values 
considered in this study is available for download at: http:// 
www.people.vcu.edu/~ejvandenoord/. 

The top significant finding, rs1 71 35437, exhibited positive 
association between vision/hearing side effects and minor 
allele count. This SNP is located in an intron of EMID2 on 
chromosome 7, as described in Table 2 (P=3.27x 10~ 8 , 
q= 0.026). Examination of linkage disequilibrium (LD) in the 
surrounding region showed no other assayed SNPs in LD with 
this locus (P 2 <0.06). Consistent with this LD information, 
proxy haplotype tests showed that none of the adjacent SNPs 
evidenced more than modest association to citalopram- 
induced vision/hearing side effects (Figure 2a). Also, the 
SNP's minor allele frequency (MAF) is low in the study sample 
(MAF = 0.01 9), and somewhat greater for AA than EA. 
However, separating the sample by racial/ethnic ancestry 
and reanalyzing the SNP shows that the direction and 
magnitude of the effect is consistent between AA and EA. 



Thus, this association provides mixed evidence of true 
discovery, with the lack of the availability of SNPs in high LD 
(to check for possible technical errors) and low MAF 
suggesting a degree of caution, but the consistency across 
racial/ethnic groups offering some reassurance. 

Among the other strongest associations was a pair of SNPs 
in high LD (P 2 = 0.978) at LAMA 1, on chromosome 1 8. Both of 
these SNPs showed strong positive associations between 
overall tolerability to citalopram side effects and minor allele 
count (rs4398173: P=5.99x10 -7 , g=0.315; rs3810046: 
P=4.78x10 -7 , q=0.281). MAF was high for both SNPs 
(>0.45), and exhibited only minor differences across racial/ 
ethnic groups. Consequently, the direction, magnitude and 
significance of the associations were consistent in racial/ 
ethnic stratified reanalysis. Examination of the LD structure in 
the study sample showed modest, but significant LD between 
six assayed SNPs, spanning ~15kb. All SNPs identified in 
PLINK proxy haplotype analyses showed systematic associa- 
tion to the overall tolerability phenotype (P<0.05), but did not 
approach genome-wide significance, except in the case of the 
two markers described above. Of note, while the two SNPs 
approaching genome-wide significance are located slightly 
downstream (~10kb) of LAMA1, the haplotype that they 
represent clearly overlaps the gene boundary by ~5kb, 
suggesting that the reported hits may be associated with 
proximate regulatory sequence or a functional variant within 
the gene's downstream end (Figure 2b). 

The strongest signal for citalopram-induced dizziness 
comprised two highly proximate SNPs (~4kb apart) at 
AOX2P, on chromosome 2. Both of these SNPs showed 
strong positive associations between dizziness and minor 
allele count (rs13430864: P=5.12x10~ 7 , g=0.181; 
rs13423450: P=7.57x10" 7 , g=0.228). Racial/ethnic 
stratified analyses indicated different haplotype structures 
between EA and AA, with AA evidencing a ~6kb proxy 
haplotype including four assayed SNPs, but no discernable 
LD in the same region among EA. The association involving 
rs1 3430864 demonstrated consistent direction and effect 
magnitude across EA and AA. However, rs1 3423450, part of a 
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GSE Overall Tolerability Sexual Side Effects 




i 1 1 1 1 1 i 1 1 1 1 1 i 1 1 1 1 r 

012345 012345 012345 

Expected (-logP) Expected (-logP) Expected (-logP) 



Dizziness Vision/Hearing 




012345 012345 
Expected (-logP) Expected (-logP) 

Figure 1 Quantile-quantile (Q-Q) plots for genome-wide association studies (GWAS) results of five citalopram-induced side effect measures. Points represent -log 10 (P- 
values) for each single-nucleotide polymorphism (SNP)-side effect outcome association test. Red lines represent the expected P-value distribution under the null hypothesis of 
no true associations. Blue lines represent 95% confidence intervals for rejecting the null hypothesis at each P-value rank. The genomic inflation parameter (A) is defined as the 
ratio of the median observed P-value to the expected median under the null distribution, thus quantifying systematic test statistic inflation. 



common haplotype with rs1 3430864 in AA but not EA, only 
evidenced association among AA. This combined with 
moderate differences in MAF by racial/ethnic status suggests 
variability in the structure and function of this locus — thus, this 
association should be regarded as tentative, pending further 
evidence (Figure 2c). 

Suggestive associations for general side effect burden 
included rs7715172, at EGFLAM, on chromosome 5 
(P=2.91 x 1CT 6 , q= 0.358). Reanalyzing the data separately 
by race/ethnicity indicated that the association was in the 
same direction and magnitude in both subsamples. Examina- 
tion of LD structure indicated the presence of a relatively large 
haplotype, with seven adjacent assayed SNPs in moderate 
LD ( = 0.11-0.59) to the implicated SNP. The haplotype 
spanned ~5kb, and three assayed SNPs in the haplotype 
were located within the gene boundary of EGFLAM (Supple- 
mentary Figure D1). Consistent with this LD structure, proxy 
haplotype analyses indicated six of the surrounding SNPs to 
exhibit moderate association to the general side effect burden 
(P<0.05). However, MAF was relatively low ( = 0.071) and 
moderate racial/ethnic difference were evident; thus, this 
finding should be considered suggestive, conditional on 
further evidence. Additional suggestive associations 
were found between general side effect burden and SNPs at 
RPT2 (rs6764050: P=8.73x10~ 7 , g = 0.182) and FHIT 
(rs4502542: P=1.62x 10~ 6 , q= 0.262), as well as several 
intergenic SNPs (Supplementary Figures D1 and D2). Among 



the intergenic associations was a genome-wide significant 
finding at rs1 6965962, located in a gene desert on chromo- 
some 7 (P= 3.22 x 10~ 7 , q= 0.096). As shown in the regional 
plot (Figure 2d), this intergenic SNP represented a distinct 
susceptibility haplotype spanning ~30 kb, with eight assayed 
SNPs showing clear, systematic association general side 
effect burden. 

Discussion 

To maximize the benefit of antidepressant therapy to the 
individual patient, not only should the most efficacious drug be 
prescribed at the time of first presentation, but also the drug 
with the minimal side effect profile. Understanding individual 
differences in the development of side effects following 
antidepressant therapy is therefore essential to personalizing 
the treatment of depression. In this study, we performed 
GWAS on five side effect factors, including general side effect 
burden, overall tolerability, sexual adverse reactions, dizzi- 
ness and vision/hearing-related side effects. We detected two 
SNPs, which according to our pre-identified criteria (FDR 
controlled at q<0.1) can be considered genome-wide 
significant. 

Our top genome-wide finding involved rs1 71 35437, an SNP 
within the gene EMID2, with minor allele count positively 
associated with the severity of citalopram vision/hearing side 
effects. EMID2 encodes the protein collagen oc-l(XXVI) chain 
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Figure 2 Regional plots for genome-wide association study (GWAS) results of (a) EMID2, (b) LAMA1, (c) AOX2P and (d) intergenic rs1 6965962 associations. Points 
represent -log 10 (P-values) for association tests and are color coded to denote linkage disequilibrium to the target single-nucleotide polymorphism (SNP) in the HapMap 
Phase II reference data. Recombination rate is represented by the light blue lines. The plots' bottom panels show the names and locations of genes in the UCSC Genome 
Browser, with exon positions denoted by cross-hatches and transcription direction by arrows. 80 



in humans, 60 which has been found to regulate corneal 
collagen fibrillogenesis. 61 Variation in the gene has also been 
associated with ataxia in neurodegenerative disorders. 62 
Moreover, EMID2 has shown evidence of pharmacogenetic 
side effect moderation, albeit to a pharmacodynamically 
distinct phenotype, aspirin-induced asthma. 63 On balance, 
considering the strength of association, links with ocular and 
neurocognitive functions and tentative evidence of pharma- 
cogenetic activity, rs1 71 35437, may be considered a reason- 
able candidate in future pharmacogenomic studies of 
selective serotonin-reuptake inhibitors. 

The associations of two high-LD LAMA1 SNPs to overall 
tolerability also suggest credible biological mechanisms. 
LAMA1 encodes a vital component of laminin proteins — 
laminin subunit oc-1. Binding to cells via a high-affinity 
receptor, laminin is thought to mediate cell attachment, 
migration and differentiation of cells into tissues during 
embryonic development by interacting with other extracellular 
matrix components, as well as maintenance of tissue 
phenotype and promotion of tissue survival. 64 ' 65 This rela- 
tively frequently studied gene has been linked to several 
cancers including non-small-cell lung and colorectal, 66 ' 67 as 
well as cardiovascular pathophysiology. 68 LAMA1 has also 
been associated with pharmacogenetic moderation of thiazo- 
lidinedione-induced side effects to diabetes treatment and, 69 
more relevantly, with moderating the effects of antidepressant 
buproprion on smoking cessation. 70 Thus, robust GWAS 
support and prior evidence of antidepressant moderation 
suggest LAMA1 as a plausible candidate for future pharma- 
cogenetic and functional analyses. 



The two high-LD candidates at pseudogene AOX2P, on 
chromosome 2, are relatively poorly characterized, but 
variants within the pseudogene have been identified and 
analyzed in two bioinformatic studies based on a targeted 
nonsynonymous SNP approach, 71 and a large-scale identifi- 
cation and characterization of putative alternative promoters 
of human genes. 72 Given that pseudogenes, by definition, 
lack protein-coding function, this finding should be viewed with 
a degree of skepticism. However, considering that functional 
genes have been misclassified as pseudogenes, 73 pseudo- 
gene transcripts have demonstrated trans-regulation of 
homologous coding genes, 74 and some endogenous small 
interfering RNA are derived from pseudogene transcription, 75 
it would be premature to dismiss this result as a false positive. 

Similarly, we view the association of variants in EGFLAMXo 
citalopram-induced general side effect burden as tentative 
because of MAF differences between racial/ethnic groups. 
However, there are potential side effect mechanisms involving 
EGFLAM suggested by previous research, including studies 
of ocular structure and function (for example, murine eye 
pathophysiology 76 and photoreceptor ribbon synapse forma- 
tion). 77 Thus, this gene remains a plausible candidate for 
future consideration. Finally, the second genome-wide sig- 
nificant SNP (rs1 6965962) was located in a gene desert on 
chromosome 7. While the lack of involvement in protein- 
coding sequence somewhat diminishes this marker's prior 
probability of being a true discovery, there is precedent for 
such intergenic associations to replicate in independent 
samples, such as found with a region on chromosome 9 in 
Type 2 diabetes mellitus. 78 Furthermore, the association 
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signal here was notably robust, including eight assayed SNPs 
spanning a ~30kb haplotype (Figure 2d). Thus, this finding 
may be evidence of long-range regulatory elements, consis- 
tent with recent research identifying gene desert enhancer 
hotspots associated with coronary artery disease. 79 

Beyond individual findings, it is also worth noting that all top 
associations exhibited positive effects of minor allele count 
and side effect severity, rather than the expected mix of 
positive and negative effects (Table 2). Further inspection of 
the GWAS output confirmed the legitimacy of this finding, and 
demonstrated that, as expected, negative and positive 
coefficients were equally likely in the full GWAS results. The 
pattern was largely supported in the ethnically stratified 
analyses as well, where all coefficients, except general side 
effect burden — rs1 6965962, were positive. Although the 
binomial probability of arriving at this result by chance is 
small, the lack of precedent in previous pharmacogenomic 
research suggests that the finding may be due to sampling 
error. However, it is also possible that the result is due to 
uncommon polymorphisms having an increased probability of 
deleterious effect. Future research will be required to 
adjudicate between these possibilities. 

Currently, it is premature to suggest direct clinical applica- 
tions of these findings for prescribing antidepressants. On the 
contrary, actualizing the promise of pharmacogenomics and 
translating academic findings into clinical applications will 
require a cumulative process of aggregating and jointly 
considering large bodies of evidence using meta-analytic and 
data integration techniques. Thus, it is crucial to conduct and 
rapidly disseminate GWAS results from large, well-designed 
clinical trials with genomic data, such as STAFTD. To facilitate 
this process, we provide all P-values (http://www.people. 
vcu.edu/~ejvandenoord/) as a resource for investigators with 
the requisite samples to carry out replication or meta-analysis. 

As with any genetic associations, our findings will require 
replication and functional validation. However, this study 
shows the potential of GWAS to discover genes and pathways 
that mediate adverse effects of antidepressant medication. A 
better understanding of these mechanisms and the roles of 
specific polymorphisms will facilitate the development of 
improved biomarker-based approaches to personalize anti- 
depressant therapy. It is hoped that this research will 
eventually contribute, however incrementally, to reducing 
the global health burden of depression, facilitating efficient 
prescription of the most efficacious and least toxic antide- 
pressant medication to MDD patients. 
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